Multivariate Normal distribution (multivariate_normal)#

The multivariate normal (a.k.a. multivariate Gaussian) is the canonical distribution for continuous, correlated random vectors. It generalizes the univariate normal by replacing the scalar variance with a covariance matrix.

It is foundational in statistics and machine learning: linear regression, Kalman filtering, Gaussian processes, LDA/QDA, PCA, and as a building block for Gaussian mixture models.

import numpy as np

import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

import scipy
from scipy import stats
from scipy.stats import multivariate_normal

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=5, suppress=True)
rng = np.random.default_rng(42)

print("NumPy ", np.__version__)
print("SciPy ", scipy.__version__)
NumPy  1.26.2
SciPy  1.15.0

Notebook roadmap#

  1. Definition and parameter space

  2. Intuition and relationships

  3. PDF / CDF

  4. Moments and key properties

  5. How parameters change the shape

  6. Likelihood + MLE sketch

  7. NumPy-only sampling

  8. Visualizations (PDF, CDF, samples)

  9. SciPy integration (scipy.stats.multivariate_normal)

  10. Statistical use cases

  11. Pitfalls

  12. Summary

1) Title & Classification#

Item

Value

Name

Multivariate Normal (multivariate_normal)

Type

Continuous

Dimension

\(d \in \mathbb{N}\) (fixed)

Support

\(x \in \mathbb{R}^d\)

Parameters

mean \(\mu \in \mathbb{R}^d\), covariance \(\Sigma \in \mathbb{S}_{++}^d\)

Here \(\mathbb{S}_{++}^d\) denotes the set of symmetric positive definite \(d\times d\) matrices.

SciPy note: scipy.stats.multivariate_normal also supports allow_singular=True, which relaxes \(\Sigma\) to be positive semidefinite.

2) Intuition & Motivation#

What it models#

A multivariate normal models a random vector whose uncertainty looks like an ellipsoid in \(\mathbb{R}^d\). The key geometric object is the Mahalanobis distance

\[ \delta^2(x) = (x-\mu)^\top \Sigma^{-1} (x-\mu), \]

which replaces the “\(z\)-score squared” from the univariate normal.

A very useful generative story:

  1. Draw \(Z \sim \mathcal{N}(0, I_d)\) (independent standard normals).

  2. Choose a matrix \(L\) such that \(\Sigma = L L^\top\) (e.g. Cholesky factor).

  3. Set $\(X = \mu + L Z.\)$

Typical real-world use cases#

  • Measurement noise with correlated sensors / features

  • Kalman filters and state-space models (Gaussian noise + linear dynamics)

  • Gaussian processes: any finite set of function values is multivariate normal

  • LDA/QDA: class-conditional feature distributions are modeled as Gaussian

  • PCA / factor models: covariance structure is the main object

Relations to other distributions#

  • Univariate normal is the special case \(d=1\).

  • Linear transformations preserve normality: if \(X\sim\mathcal{N}(\mu,\Sigma)\), then \(AX+b\) is normal.

  • Conditionals and marginals of a multivariate normal are again (multi)variate normals.

  • The quadratic form \(\delta^2(X)\) has a chi-square distribution: \(\delta^2(X) \sim \chi^2_d\).

  • The sample covariance of Gaussian data is Wishart-distributed.

(A deeper fact: among all continuous distributions with a given mean and covariance, the multivariate normal has maximum entropy.)

3) Formal Definition#

Let \(d\) be the dimension, \(\mu\in\mathbb{R}^d\), and \(\Sigma\in\mathbb{S}_{++}^d\). We write

\[ X \sim \mathcal{N}(\mu, \Sigma). \]

PDF#

The probability density function (PDF) is

\[ f(x\mid\mu,\Sigma) = \frac{1}{(2\pi)^{d/2}\,|\Sigma|^{1/2}} \exp\left(-\tfrac{1}{2}(x-\mu)^\top \Sigma^{-1}(x-\mu)\right), \qquad x\in\mathbb{R}^d. \]

A numerically stable way to work is the log-density:

\[ \log f(x) = -\tfrac{1}{2}\Big(d\log(2\pi) + \log|\Sigma| + (x-\mu)^\top\Sigma^{-1}(x-\mu)\Big). \]

CDF#

The multivariate CDF is defined by integrating over a rectangle (orthant):

\[ F(x_1,\ldots,x_d) = \mathbb{P}(X_1\le x_1,\ldots,X_d\le x_d) = \int_{-\infty}^{x_1}\cdots\int_{-\infty}^{x_d} f(u)\,du. \]

For \(d>1\), there is no general closed form; practical evaluation uses numerical methods.

def _as_2d(x: np.ndarray) -> np.ndarray:
    x = np.asarray(x, dtype=float)
    if x.ndim == 1:
        return x.reshape(1, -1)
    if x.ndim != 2:
        raise ValueError("x must be a 1D or 2D array")
    return x


def _check_mean_cov(mean: np.ndarray, cov: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
    mean = np.asarray(mean, dtype=float)
    cov = np.asarray(cov, dtype=float)

    if mean.ndim != 1:
        raise ValueError("mean must be 1D")
    if cov.ndim != 2 or cov.shape[0] != cov.shape[1]:
        raise ValueError("cov must be a square 2D array")

    d = mean.shape[0]
    if cov.shape != (d, d):
        raise ValueError(f"cov must have shape ({d}, {d})")

    if not np.allclose(cov, cov.T, atol=1e-12, rtol=0):
        raise ValueError("cov must be symmetric")

    return mean, cov


def mvn_logpdf(x: np.ndarray, mean: np.ndarray, cov: np.ndarray) -> np.ndarray:
    """Log-PDF of a multivariate normal using NumPy only (SPD covariance required).

    Args:
        x: shape (d,) or (n, d)
        mean: shape (d,)
        cov: shape (d, d), symmetric positive definite

    Returns:
        logpdf values with shape (n,) (or scalar if x is (d,)).
    """
    mean, cov = _check_mean_cov(mean, cov)
    x2 = _as_2d(x)
    d = mean.shape[0]
    if x2.shape[1] != d:
        raise ValueError(f"x must have dimension d={d}")

    # Cholesky: cov = L L^T (requires SPD)
    L = np.linalg.cholesky(cov)

    diff = (x2 - mean)  # (n, d)
    # Solve L y = diff^T => y = L^{-1} diff^T
    y = np.linalg.solve(L, diff.T)  # (d, n)
    maha2 = np.sum(y**2, axis=0)  # (n,)

    log_det = 2.0 * np.sum(np.log(np.diag(L)))
    log_norm = -0.5 * (d * np.log(2.0 * np.pi) + log_det)
    out = log_norm - 0.5 * maha2

    if np.asarray(x).ndim == 1:
        return float(out[0])
    return out


def mvn_pdf(x: np.ndarray, mean: np.ndarray, cov: np.ndarray) -> np.ndarray:
    return np.exp(mvn_logpdf(x, mean, cov))


def mvn_rvs(
    mean: np.ndarray,
    cov: np.ndarray,
    size: int,
    rng: np.random.Generator | None = None,
) -> np.ndarray:
    """Sample from N(mean, cov) using NumPy only (via Cholesky).

    Returns an array of shape (size, d).
    """
    if rng is None:
        rng = np.random.default_rng()

    mean, cov = _check_mean_cov(mean, cov)
    d = mean.shape[0]
    L = np.linalg.cholesky(cov)
    z = rng.standard_normal(size=(int(size), d))
    return mean + z @ L.T


def mvn_cdf_mc(
    x: np.ndarray,
    mean: np.ndarray,
    cov: np.ndarray,
    n: int = 50_000,
    rng: np.random.Generator | None = None,
) -> tuple[float, float]:
    """Monte Carlo estimate of F(x) = P(X_1<=x_1,...,X_d<=x_d).

    Returns (estimate, standard_error).
    """
    if rng is None:
        rng = np.random.default_rng()

    x = np.asarray(x, dtype=float)
    if x.ndim != 1:
        raise ValueError("x must be a 1D point")

    samples = mvn_rvs(mean, cov, size=int(n), rng=rng)
    hits = np.all(samples <= x[None, :], axis=1)
    p = float(np.mean(hits))
    se = float(np.sqrt(p * (1.0 - p) / n))
    return p, se


# Quick spot-check vs SciPy
mu = np.array([0.5, -1.0])
Sigma = np.array([[1.5, 0.4], [0.4, 0.8]])
x_test = np.array([[0.0, 0.0], [1.0, -1.5], [2.0, -0.5]])

ours = mvn_logpdf(x_test, mu, Sigma)
theirs = multivariate_normal(mean=mu, cov=Sigma).logpdf(x_test)
print("max |logpdf_ours - logpdf_scipy| =", float(np.max(np.abs(ours - theirs))))
max |logpdf_ours - logpdf_scipy| = 0.0

4) Moments & Properties#

Let \(X \sim \mathcal{N}(\mu,\Sigma)\) in dimension \(d\).

Moments#

Quantity

Value

Mean

\(\mathbb{E}[X]=\mu\)

Covariance

\(\mathrm{Cov}(X)=\Sigma\)

Variance of component \(i\)

\(\mathrm{Var}(X_i)=\Sigma_{ii}\)

Variance of linear combo \(a^\top X\)

\(\mathrm{Var}(a^\top X)=a^\top\Sigma a\)

Skewness

0 (all odd central moments are 0)

Kurtosis

any 1D marginal has kurtosis 3; Mardia kurtosis \(\beta_{2,d}=d(d+2)\)

A common multivariate kurtosis measure (Mardia) is

\[ \beta_{2,d} = \mathbb{E}\big[\delta^4(X)\big] = \mathbb{E}\big[\big((X-\mu)^\top\Sigma^{-1}(X-\mu)\big)^2\big] = d(d+2). \]

MGF and characteristic function#

For \(t\in\mathbb{R}^d\),

\[ M_X(t)=\mathbb{E}[e^{t^\top X}]=\exp\left(t^\top\mu + \tfrac{1}{2} t^\top\Sigma t\right), \]

and the characteristic function is

\[ \varphi_X(t)=\mathbb{E}[e^{i t^\top X}]=\exp\left(i t^\top\mu - \tfrac{1}{2} t^\top\Sigma t\right). \]

Entropy#

The differential entropy (in nats) is

\[ H(X)=\tfrac{1}{2}\log\big((2\pi e)^d |\Sigma|\big). \]

Key closure properties#

  • Affine transform: if \(X\sim\mathcal{N}(\mu,\Sigma)\) then \(Y=AX+b\) is normal with \(\mathbb{E}[Y]=A\mu+b\) and \(\mathrm{Cov}(Y)=A\Sigma A^\top\).

  • Marginals: any subvector of \(X\) is multivariate normal.

  • Conditionals: \(X_A\mid X_B=b\) is multivariate normal (with closed-form mean/cov).

  • Uncorrelated = independent (special to the multivariate normal): if \(\mathrm{Cov}(X_i,X_j)=0\) then \(X_i\) and \(X_j\) are independent.

5) Parameter Interpretation#

  • The mean vector \(\mu\) is a location parameter: it shifts the center of mass.

  • The covariance matrix \(\Sigma\) controls:

    • scale (overall spread),

    • anisotropy (different variance along different directions),

    • correlation (tilt/rotation of the ellipsoids).

A helpful view is the eigendecomposition:

\[ \Sigma = Q\Lambda Q^\top, \]

where columns of \(Q\) are orthonormal eigenvectors and \(\Lambda=\mathrm{diag}(\lambda_1,\ldots,\lambda_d)\). Then:

  • eigenvectors give the principal axes of the density,

  • eigenvalues give the variances along those axes.

Increasing an eigenvalue stretches the ellipsoid in that direction; changing correlation rotates the ellipsoid.

# How correlation changes the shape (d=2, unit variances)
mu = np.zeros(2)
rhos = [-0.8, 0.0, 0.8]

x1 = np.linspace(-3.5, 3.5, 160)
x2 = np.linspace(-3.5, 3.5, 160)
X1, X2 = np.meshgrid(x1, x2)
grid = np.column_stack([X1.ravel(), X2.ravel()])

fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=[f"correlation ρ = {rho}" for rho in rhos],
    horizontal_spacing=0.06,
)

for j, rho in enumerate(rhos, start=1):
    Sigma = np.array([[1.0, rho], [rho, 1.0]])
    Z = mvn_pdf(grid, mu, Sigma).reshape(X1.shape)
    fig.add_trace(
        go.Contour(
            x=x1,
            y=x2,
            z=Z,
            contours=dict(showlabels=False),
            colorscale="Viridis",
            line_smoothing=0.8,
            showscale=(j == 3),
            colorbar=dict(title="pdf") if j == 3 else None,
        ),
        row=1,
        col=j,
    )
    fig.update_xaxes(title_text="x1", row=1, col=j)
    fig.update_yaxes(title_text="x2", row=1, col=j)

fig.update_layout(title="Effect of correlation on the PDF (unit variances)", height=360)
fig.show()

6) Derivations#

Below are standard derivations/sketches that are useful to remember.

Expectation and covariance#

Using the generative representation \(X=\mu+LZ\) with \(Z\sim\mathcal{N}(0,I)\) and \(\Sigma=LL^\top\):

\[ \mathbb{E}[X] = \mathbb{E}[\mu + LZ] = \mu + L\,\mathbb{E}[Z] = \mu. \]

For the covariance:

\[ \mathrm{Cov}(X) = \mathrm{Cov}(LZ) = L\,\mathrm{Cov}(Z)\,L^\top = L I L^\top = \Sigma. \]

Likelihood (i.i.d. data)#

Let \(x_1,\ldots,x_n\) be i.i.d. from \(\mathcal{N}(\mu,\Sigma)\). The log-likelihood is

\[ \ell(\mu,\Sigma) = -\frac{n}{2}\Big(d\log(2\pi) + \log|\Sigma|\Big) - \frac{1}{2}\sum_{i=1}^n (x_i-\mu)^\top\Sigma^{-1}(x_i-\mu). \]

The maximum-likelihood estimators (MLEs) are:

\[ \hat\mu = \bar x,\qquad \hat\Sigma_{\mathrm{MLE}} = \frac{1}{n}\sum_{i=1}^n (x_i-\bar x)(x_i-\bar x)^\top. \]

(The unbiased sample covariance replaces \(1/n\) by \(1/(n-1)\).)

# Verify the mean/covariance derivation empirically
d = 3
mu_true = np.array([1.0, -2.0, 0.5])
Sigma_true = np.array(
    [
        [2.0, 0.3, -0.2],
        [0.3, 1.0, 0.4],
        [-0.2, 0.4, 1.5],
    ]
)

X = mvn_rvs(mu_true, Sigma_true, size=120_000, rng=rng)

mu_hat = X.mean(axis=0)
cov_hat_mle = np.cov(X, rowvar=False, bias=True)  # 1/n

print("mu_true:", mu_true)
print("mu_hat :", mu_hat)
print("\nSigma_true:\n", Sigma_true)
print("\nSigma_hat (MLE, 1/n):\n", cov_hat_mle)

# Compare MLE to SciPy's fit
mu_fit, cov_fit = multivariate_normal.fit(X)
print("\nmax |mu_fit - mu_hat| =", float(np.max(np.abs(mu_fit - mu_hat))))
print("max |cov_fit - cov_hat_mle| =", float(np.max(np.abs(cov_fit - cov_hat_mle))))

# Log-likelihood at the true params vs at the fitted params
ll_true = float(np.sum(multivariate_normal(mean=mu_true, cov=Sigma_true).logpdf(X)))
ll_fit = float(np.sum(multivariate_normal(mean=mu_fit, cov=cov_fit).logpdf(X)))
print("\nlog-likelihood (true params):", ll_true)
print("log-likelihood (fitted MLE) :", ll_fit)
mu_true: [ 1.  -2.   0.5]
mu_hat : [ 0.99764 -2.00281  0.50352]

Sigma_true:
 [[ 2.   0.3 -0.2]
 [ 0.3  1.   0.4]
 [-0.2  0.4  1.5]]

Sigma_hat (MLE, 1/n):
 [[ 2.01425  0.3019  -0.19628]
 [ 0.3019   0.99606  0.39907]
 [-0.19628  0.39907  1.49794]]

max |mu_fit - mu_hat| = 0.0
max |cov_fit - cov_hat_mle| = 4.440892098500626e-16

log-likelihood (true params): -564889.0870789137
log-likelihood (fitted MLE) : -564884.8911281398

7) Sampling & Simulation#

NumPy-only algorithm (Cholesky)#

To sample \(X\sim\mathcal{N}(\mu,\Sigma)\):

  1. Compute the Cholesky factor \(L\) of the covariance: \(\Sigma = LL^\top\).

  2. Draw \(Z\in\mathbb{R}^d\) with i.i.d. standard normal entries.

  3. Return \(X = \mu + LZ\).

This works because affine transformations of Gaussians are Gaussian.

Cost:

  • one Cholesky factorization costs \(\mathcal{O}(d^3)\),

  • each sample costs \(\mathcal{O}(d^2)\) for the matrix-vector multiply.

If \(\Sigma\) is nearly singular, Cholesky can fail; common fixes include adding a small diagonal “jitter” \(\varepsilon I\).

# Sampling demo: squared Mahalanobis distance should look chi-square(d)
d = 4
mu0 = np.zeros(d)
A = rng.normal(size=(d, d))
Sigma0 = A @ A.T + 0.5 * np.eye(d)  # SPD

X = mvn_rvs(mu0, Sigma0, size=50_000, rng=rng)

# Compute delta^2(x) = (x-mu)^T Sigma^{-1} (x-mu) without forming Sigma^{-1}
L = np.linalg.cholesky(Sigma0)
Y = np.linalg.solve(L, (X - mu0).T)  # (d, n)
delta2 = np.sum(Y**2, axis=0)

# Compare moments with chi-square(d)
print("E[delta^2] empirical:", float(np.mean(delta2)), "theory:", d)
print("Var[delta^2] empirical:", float(np.var(delta2)), "theory:", 2 * d)
E[delta^2] empirical: 4.007935607274497 theory: 4
Var[delta^2] empirical: 8.065157788886816 theory: 8

8) Visualization#

We’ll focus on \(d=2\) so we can visualize:

  • PDF as contours

  • CDF as a heatmap of \(F(x_1,x_2)=\mathbb{P}(X_1\le x_1, X_2\le x_2)\)

  • Monte Carlo samples as a scatter plot

mu2 = np.array([0.5, -0.5])
Sigma2 = np.array([[1.0, 0.75], [0.75, 1.8]])

# Grid
x1 = np.linspace(-3.5, 4.0, 180)
x2 = np.linspace(-4.5, 3.5, 180)
X1, X2 = np.meshgrid(x1, x2)
grid = np.column_stack([X1.ravel(), X2.ravel()])

# PDF via our NumPy implementation
Z_pdf = mvn_pdf(grid, mu2, Sigma2).reshape(X1.shape)

# Samples
S = mvn_rvs(mu2, Sigma2, size=2_000, rng=rng)

fig = go.Figure()
fig.add_trace(
    go.Contour(
        x=x1,
        y=x2,
        z=Z_pdf,
        contours=dict(showlabels=False),
        colorscale="Viridis",
        line_smoothing=0.8,
        showscale=True,
        name="PDF",
    )
)
fig.add_trace(
    go.Scatter(
        x=S[:, 0],
        y=S[:, 1],
        mode="markers",
        marker=dict(size=4, color="rgba(0,0,0,0.35)"),
        name="samples",
    )
)
fig.update_layout(
    title="Multivariate normal: PDF contours + Monte Carlo samples (d=2)",
    xaxis_title="x1",
    yaxis_title="x2",
    legend=dict(orientation="h"),
)
fig.show()
# CDF heatmap using SciPy's multivariate_normal.cdf (numerical evaluation)
rv2 = multivariate_normal(mean=mu2, cov=Sigma2)
Z_cdf = rv2.cdf(grid).reshape(X1.shape)

fig = go.Figure(
    data=go.Heatmap(
        x=x1,
        y=x2,
        z=Z_cdf,
        colorscale="Blues",
        colorbar=dict(title="F(x1,x2)"),
    )
)
fig.update_layout(
    title="Multivariate normal: CDF heatmap (d=2)",
    xaxis_title="x1",
    yaxis_title="x2",
)
fig.show()

# A quick MC check at one point
x_point = np.array([0.0, 0.0])
p_mc, se_mc = mvn_cdf_mc(x_point, mu2, Sigma2, n=100_000, rng=rng)
p_scipy = float(rv2.cdf(x_point))
print("F(0,0) SciPy:", p_scipy)
print("F(0,0) MC  :", p_mc, "+/-", 2 * se_mc)
F(0,0) SciPy: 0.270207507991013
F(0,0) MC  : 0.27102 +/- 0.0028111788246214433

9) SciPy Integration (scipy.stats.multivariate_normal)#

SciPy provides the multivariate normal as scipy.stats.multivariate_normal.

rv = scipy.stats.multivariate_normal(mean=mu, cov=Sigma)

Common methods:

  • pdf, logpdf

  • cdf (numerical)

  • rvs

  • entropy

  • fit (MLE for mean and covariance)

Note: pdf can underflow in moderate dimension; prefer logpdf for likelihood work.

mu = np.array([1.0, 2.0])
Sigma = np.array([[2.0, -0.3], [-0.3, 1.0]])

rv = multivariate_normal(mean=mu, cov=Sigma)

pts = np.array([[1.0, 2.0], [0.0, 0.0], [2.0, 3.0]])

print("pdf   :", rv.pdf(pts))
print("logpdf:", rv.logpdf(pts))
print("cdf   :", rv.cdf(pts))
print("entropy:", float(rv.entropy()), "nats")

samples = rv.rvs(size=5_000, random_state=rng)

# Fit parameters (MLE)
mu_fit, cov_fit = multivariate_normal.fit(samples)
print("\nmu_fit:", mu_fit)
print("cov_fit:\n", cov_fit)

# Compare to our NumPy-only logpdf
ours = mvn_logpdf(pts, mu, Sigma)
theirs = rv.logpdf(pts)
print("\nmax |logpdf_ours - logpdf_scipy| =", float(np.max(np.abs(ours - theirs))))
pdf   : [0.11516 0.00797 0.04488]
logpdf: [-2.16143 -4.83159 -3.10384]
cdf   : [0.21598 0.00246 0.6249 ]
entropy: 3.1614286874386144 nats

mu_fit: [0.98169 2.02102]
cov_fit:
 [[ 1.9965  -0.30165]
 [-0.30165  0.98532]]

max |logpdf_ours - logpdf_scipy| = 4.440892098500626e-16

10) Statistical Use Cases#

Hypothesis testing#

  • Mean vector testing: Hotelling’s \(T^2\) tests \(H_0: \mu=\mu_0\) for i.i.d. Gaussian data.

  • Covariance testing: tests about \(\Sigma\) (equality, sphericity) often use Wishart theory.

  • Normality checks: multivariate normality can be assessed with Q–Q plots of Mahalanobis distances (should be \(\chi^2_d\)).

Bayesian modeling#

  • A multivariate normal is a natural prior for a vector of parameters.

  • With a Gaussian likelihood and known covariance, the posterior for the mean is again Gaussian.

  • For unknown mean and covariance, the conjugate prior is Normal–Inverse-Wishart.

Generative modeling#

  • Gaussian mixture models (GMMs) approximate complex densities by mixing Gaussians.

  • Many latent-variable models assume Gaussian latent variables (e.g., factor analysis, VAEs).

# Hotelling's T^2 test demo: H0: mu = mu0
# Under H0: (n-d)/(d*(n-1)) * T^2 ~ F_{d, n-d}
n = 60
d = 3

mu_true = np.array([0.2, -0.1, 0.3])
Sigma_true = np.array(
    [
        [1.2, 0.2, 0.0],
        [0.2, 1.0, 0.4],
        [0.0, 0.4, 1.5],
    ]
)
X = mvn_rvs(mu_true, Sigma_true, size=n, rng=rng)

mu0 = np.zeros(d)  # null hypothesis

xbar = X.mean(axis=0)
S_unbiased = np.cov(X, rowvar=False, bias=False)  # 1/(n-1)

# Solve S^{-1}(xbar-mu0) without explicit inverse
diff = (xbar - mu0).reshape(-1, 1)
sol = np.linalg.solve(S_unbiased, diff)
T2 = float(n * (diff.T @ sol))

F_stat = (n - d) / (d * (n - 1)) * T2
p_value = float(stats.f.sf(F_stat, dfn=d, dfd=n - d))

print("T^2 statistic:", T2)
print("F statistic :", F_stat)
print("p-value     :", p_value)
T^2 statistic: 8.301062635065886
F statistic : 2.673223560444946
p-value     : 0.055836249807211434
/tmp/ipykernel_1032085/3864252406.py:24: DeprecationWarning:

Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
# Multivariate normality check: Mahalanobis-distance Q-Q vs chi-square
n = 600
d = 2

mu_true = np.array([0.0, 0.0])
Sigma_true = np.array([[1.0, 0.6], [0.6, 1.3]])
X = mvn_rvs(mu_true, Sigma_true, size=n, rng=rng)

mu_hat = X.mean(axis=0)
S_unbiased = np.cov(X, rowvar=False, bias=False)

# delta^2_i = (x_i-mu_hat)^T S^{-1} (x_i-mu_hat)
L = np.linalg.cholesky(S_unbiased)
Y = np.linalg.solve(L, (X - mu_hat).T)
delta2 = np.sum(Y**2, axis=0)

delta2_sorted = np.sort(delta2)
q = (np.arange(1, n + 1) - 0.5) / n
chi2_q = stats.chi2.ppf(q, df=d)

maxv = float(max(delta2_sorted.max(), chi2_q.max()))

fig = go.Figure()
fig.add_trace(go.Scatter(x=chi2_q, y=delta2_sorted, mode="markers", name="data"))
fig.add_trace(go.Scatter(x=[0, maxv], y=[0, maxv], mode="lines", name="y = x"))
fig.update_layout(
    title="Mahalanobis-distance Q-Q plot (should be ~ linear under MVN)",
    xaxis_title="Chi-square quantiles (df=d)",
    yaxis_title="Sorted Mahalanobis distances (delta^2)",
)
fig.show()
# Bayesian update for the mean (known covariance):
# Prior:   mu ~ N(m0, V0)
# Likelihood: x_i | mu ~ N(mu, Sigma)
# Posterior: mu | X ~ N(mn, Vn)

d = 2
Sigma = np.array([[1.0, 0.4], [0.4, 1.5]])

m0 = np.array([0.0, 0.0])
V0 = 4.0 * np.eye(d)  # diffuse prior

mu_true = np.array([1.0, -1.0])
X = mvn_rvs(mu_true, Sigma, size=40, rng=rng)

# Work in precision form
Prec0 = np.linalg.inv(V0)
PrecL = np.linalg.inv(Sigma)

Vn = np.linalg.inv(Prec0 + X.shape[0] * PrecL)
mn = Vn @ (Prec0 @ m0 + PrecL @ X.sum(axis=0))

print("prior mean:", m0)
print("posterior mean:", mn)
print("true mean:", mu_true)
print("posterior cov:\n", Vn)
prior mean: [0. 0.]
posterior mean: [ 0.93823 -0.82923]
true mean: [ 1. -1.]
posterior cov:
 [[0.02482 0.00985]
 [0.00985 0.03713]]

11) Pitfalls#

  • Invalid covariance: \(\Sigma\) must be symmetric positive (semi)definite. Small asymmetries from floating-point noise can break factorization.

  • Near-singularity: Cholesky may fail if \(\Sigma\) is ill-conditioned. A common fix is adding jitter: \(\Sigma \leftarrow \Sigma + \varepsilon I\).

  • Avoid explicit matrix inverses: compute quadratic forms with solves (as we did) for stability.

  • Underflow in pdf: in moderate/high dimension, densities can be tiny; use logpdf for likelihoods.

  • Interpreting correlation: zero correlation implies independence only for jointly Gaussian variables.

  • CDF in high dimension: multivariate CDF evaluation can become expensive or less accurate as \(d\) grows.

12) Summary#

  • multivariate_normal is a continuous distribution on \(\mathbb{R}^d\) with parameters mean \(\mu\) and covariance \(\Sigma\).

  • Its density is controlled by the Mahalanobis distance \((x-\mu)^\top\Sigma^{-1}(x-\mu)\).

  • Key formulas: \(\mathbb{E}[X]=\mu\), \(\mathrm{Cov}(X)=\Sigma\), \(M_X(t)=\exp(t^\top\mu + \tfrac{1}{2}t^\top\Sigma t)\), and \(H(X)=\tfrac{1}{2}\log((2\pi e)^d|\Sigma|)\).

  • Sampling is easy via Cholesky: \(X=\mu+LZ\) with \(Z\sim\mathcal{N}(0,I)\).

  • SciPy’s scipy.stats.multivariate_normal provides pdf/logpdf/cdf/rvs/fit.

References#

  • SciPy docs: scipy.stats.multivariate_normal

  • Any standard multivariate statistics text (e.g., Anderson, An Introduction to Multivariate Statistical Analysis)